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Summary: ^  Two  sets  of  modified  kernel  estimates  of  a 
regression  function  are  proposed:  one  when  a  bound  on 
the  regression  function  is  known  and  the  other  when 
nothing  of  this  sort  is  at  hand.  Explicit  bounds  on 
the  mean  square  errors  of  the  estimators  are  obtained. 
Pointwise  as  well  as  uniform  consistency  in  mean  square 
and  consistency  in  probability  of  the  estimators  are 
proved.  Speed  of  convergence  in  each  case  is  investigated 


-  I- 


1.  INTRODUCTION 

.  \ 

^The  theory  of  regression  is  concerned  with  the  pre¬ 
diction  of  the  value  of  a  variable,  called  the  response 
or  dependent  variable,  at  a  given  value  of  another 
(correlated)  variable,  called  the  predictor  or  independent 
variable.  Prediction  is  needed  in  several  practical 
situations.  For  example,  an  agriculturist  wants  to  know 
the  yield  of  wheat  at  an  amount  of  a  specified  fertilizer, 
a  net eoro legist  wants  to  forecast  weather  several  hours 
ahead  on  the  basis  of  previous  atmospheric  measurements 
and  a  physician  is  interested  in  determining  the  weight 
of  a  patient  in  terms  of  the  number  of  weeks  he  or  she 

has  been  on  a  diet.  -  J  '  A  - 

Let  us  denote  the  response  variable  by  Y  and  the 
predictor  variable  (also  known  as  regressor  variable)  by 
X.  Then  the  regression  of  Y  on  X  evaluated  at  X  *  x 
is  given  by 

r(X)  -  E (Y | X  -  x). 


It  is  well  known  that  the  regression  curve  r(X)  of  Y 
on  X  is  the  best  predictor  of  Y  in  terms  of  X  in  the 
sense  that  if  t(X)  is  any  other  predictor  of  Y,  then 
the  average  squared  error  incurred  due  to  predictor  t(X) 
is  not  smaller  than  that  incurred  due  to  predictor  r(X) . 

If  the  joint  distribution  of  the  two  variables  X  and 
Y  is  known,  then  the  prediction  of  Y  can  be  made  by 


computing  the  conditional  expectation  of  Y  at  the  desired 
value  of  X.  Otherwise,  the  regression  curve  r(x)  is  not 
directly  available  to  us.  In  such  situations,  if  observa¬ 
tions  (X1,Y1),...,(Xn,Yn)  on  (X,Y)  are  at  hand,  then 
sometimes  the  theory  of  least  square  methods  or  that  of 
maximum  likelihood  methods  can  be  applied  to  estimation  of 
r(x),  but  this  may  be  done  only  if  the  exact  model  (the 
functional  form)  of  the  regression  curve  is  known,  and, 
further,  for  the  use  of  m.£.  methods,  the  distribution  of 
the  errors 

ej  -  Yj  -  E(Yj | X) 
must  also  be  known. 

However,  the  population  of  all  suitable  functional 
forms  (or  of  the  distributions  of  errors)  is  quite  often 
unpractically  large.  Therefore,  no  matter  how  carefully 
chosen  a  model  is  adopted,  there  is  always  a  possibility 
of  misspecification.  Moreover,  even  if  the  exact  functional 
form  of  the  regression  model  involving  unknown  parameters 
is  known  (which  is  extremely  rare),  the  above  methods  of 
least  squares  and/or  of  the  maximum  likelihood  sometimes 
do  not  work  at  all.  This  is  especially  the  case  when  the 
model  is  the  mixture  of  polynomial,  exponential,  reciprocal, 
logarithmic,  trigonometric  and/or  likewise  functions  of  the 
regressor  variables,  each  involving  unknown  parameters. 
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The  problems  of  estimation  or  a  regression  curve  r  when 
nothing  is  known  about  the  functional  form  of  r  but  the 
conditional  density  of  Y  given  X  *  x  is  known  to  belong 
to  certain  class  of  densities  have  been  treated  by  Kale 
(1962),  Nadaraya  (1964,  1965),  Singh  and  Tracy  (1977)  and 
Singh  (1980).  Whereas  in  the  first  three  of  these  papers, 
the  conditional  density  of  Y  given  X  *  x  is  normal  with 
mean  x  and  variance  one,  and  the  unconditional  distribution 
function  of  X  possesses  a  density,  in  the  third  and  fourth 
papers  the  density  of  Y  given  X  *  x  is  of  the  form 
C(y)u(x)e’yx  and  C(y)u(x)e" y// x  respectively  and  the 
distribution  of  X  need  not  possess  a  density.  However, 
the  methods  cited  in  these  works  are  too  restrictive  and 
may  also  lead  to  misspecification  of  the  model,  because  the 
conditional  density  of  Y  given  X  ■  x  is  rarely  known  or 
may  incorrectly  be  specified. 

The  only  way  of  avoiding  misspecification  of  the  func¬ 
tional  form  of  the  regression  model  or  of  the  distributional 
form  of  the  errors  is,  in  fact,  to  assume  no  specific  para¬ 
metric  functional  form  of  the  model  or  of  the  distribution 
of  errors,  that  is  to  estimate  the  regression  function  conpletely 
nonparametrically .  Nadaraya  (1964),  Watson  (1964),  Rosenblatt 
(1961),  Noda  (1976)  and  Collomb  (  1977,  1979)  are  among  the 
firsts  to  consider  estimating  regression  function  r  by  rn  (defined  below 


-V 
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nonparametrically  using  Rosenblatt  (1956)  -  Parzen  (1962) 
type  kernel  estimates  of  a  density  function.  Various 
asymptotic  properties  of  these  estimates,  known  as  kernel 
estimates  of  a  regression  function,  have  been  studied  in 
the  literature  by  a  number  of  authors  including  the  above 
authors  as  well  as  by  Nadaraya  (1974),  Konakov  (1977)  and 
Rdv6sz  (1979).  Schuster  (1972)  proved  the  asymptotic 
normality  of  these  estimates  whereas  Noda  (1976)  proved  the 
pointwise  strong  consistency  and  Collomb  (1979),  Devroye  (1979),  Wandl 
(1980)  and  Mack  and  Silverman  (1982)  proved  uniform  strong 
consistency.  Devroye  and  Wagner  (1980) 

and  Spiegelman  and  Sacks  (1980)  proved  Lp  convergence  of 

r  in  the  sense  that  lim  E/ | r  (x) -r(x) |pdu(x)  «  0  where 
n  n-*»  “ 

u  is  a  probability  measure  generated  by  the  r.v.X.  How¬ 
ever,  strong  convergence (pointwise  or  uniform)  and  Lp 
convergence  concepts  differ  from  the  pointwise  and/or 
uniform  mean  square  consistency  concept  we  shall  deal  with. 

Moreover  the  kernel  estimates  rR  of  a  regression  function  based 
on  a  sample  { (X^,Y^) , . . . , (X  ,Yn)  }  on  (X,Y)  considered  in 
the  above  and  other  works  are  defined  by  rn  »  (h^/ gn)  where 

hn(x)  -  (nfi) ' 1  I  Y  K((X.-x)/6))  and 
j  *1  J  J 

-i  n 

g  (x)  -  (n<5)  1  l  K(  (X .  -x)/6) ,  with  K  and  6  being 
j«l  J 

respectively  the  kernel  and  the  windowwidth  functions. 
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Hence  with  such  an  estimate,  since  the  kernel  function  K 

could  assume  a  zero,  negative  or  positive  value,  there  is 

always  a  chance  of  blowing  up  the  estimate  hn/gn  itself 

(or  of  excessively  overestimating  the  regression)  in 

practice  for  any  given  set  of  data  whenever  gn  is  near 

zero.  To  avoid  this  problem,  in  this  paper  we  consider  a 

modified  kernel  estimate  which  is  a  retraction  of  the 

function  h  /g  to  an  interval  [-c  ,c  1  with  c  con- 
n' *n  L  n*  nJ  n 

verging  to  infinity  with  certain  rate. 

In  Section  2  we  introduce  our  modified  kernel  estimate 
of  the  regression  function.  In  Section  3  we  prove  point- 
wise  mean  square,  consist eney  and  duduce  from  it  the  weak 
consistency  of  our  estimates.  In  each  case  the  speed  of 

convergence  is  examined.  An  explicit  bound  for  the  mean 
square  error,  lacking  to  date  in  the  literature  for  the 
kernel  type  regression  estimates,  is  also  obtained.  In 
Section  4  uniform  mean  square  and  uniform  weak  consistencies 
are  proved  and  their  speeds  of  convergence  are  investigated. 
In  Section  5  remarks  are  made  on  the  choice  of  windowwidth 
function,  kernel  function  and  the  sequence  {c^}. 

Throughout  this  paper  convergence  of  a  function  de¬ 
pending  on  n  is  w.r.t.  n  -*■  «.  The  integrals  without 
showing  the  limits  are  over  the  whole  real  line. 


2.  ESTIMATORS  OF  REGRESSION  CURVES 


Then  the  regression  curve  of  Y  on  X  evaluated  at 
X  -  x  is 

(2.1)  r(x)  -  E(Y | X  »  x)  -  ,  provided  g(x)  t  0. 

Our  method  of  estimation  of  r  involves  estimation  of  h 

and  g  on  the  basis  of  the  random  sample  {(X^ ,Y^) , . . . , (X  ,Yn) } 

on  ( X , Y) . 

Let  s  be  a  positive  integer  and  <s  be  the  class  of 
all  real  valued  Borel  measurable  bounded  functions  K  such 
that 

(2.2)  jK(y)dy  -  1,  /y^K(y)dy  »  0  for  j  *  1 . s-1, 

/ 1  y  |S|K(y)  |dy  <  «  and  |yK(y)  j  ■+  0  as  |y|  ®. 

Kernels  of  the  type  (2.2)  have  been  used  in  density  estimates 
by  Johns  and  Van  Ryzin  (1972),  and  Singh  (1977  and  1981)^ 
among  others.  For  any  given  s,  the  class  is  quite 

large.  For  example,  for  s  *  1  and  2, 

K(y)  =  (27r)*1/2exp(-y2/2)I(-«  <  y  <  •)  or 

K(y)  s  (2a)  ^If-a  <  y  <  a)  for  an  a  >  0  belong  to  Kg  . 

For  s  *  3  and  4,  the  functions  K(y)  =  (2tt)  *  2  [  2  exp(-y<‘/2) 
-  (l/2)exp(-y2/4) ] I (-«  <  y  <  “)  or 

K(y)  *  (2tt)  " 1//2 (1/2)  (3-y2) exp(-y2/2)  I  (-«  <  y  <  ®)  belong 
to  Kg.  For  any  given  s,  polynomials  K(y)  in  y  on  a 
finite  interval  (a,b)  belonging  to  Kg  can  be  constructed 
(e.g.  see  Singh  (1981)). 
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Let  6*6  and  n  =  n  be  two  positive  sequences  of  numbers 
n  n 

based  on  the  sample  size  n  so  that  max{6n,nn}  0  as 

n  -►  oo.  Let  x  be  a  point  at  which  we  wish  to  estimate 

r(x).  For  a  fixed  s,  let  <  be  a  fixed  member  of  K  .  Let 

s 


(2.3) 


h(x)  -  (nfi)  l  Y  K U- 
j-1  3  1  6 


X  -  -  x 


(2.4) 


n  fX. 


g(x) 


jix  ‘i-vj 


(2.5) 


rn(x)  «  W5I 
«(x) 


In  the  existing  literature  the  kernel  type  estimates 
of  the  regression  curve  (excluding  those  of  the  type  con¬ 
sidered  in  Priestley  and  Chao  ( 1 9 7 2 )  ,  Bhattacharya  (1976),  Benedetti  (1977), 
Stone  (1977)  and  Wahba  (1978))  are  exactly  of  the  type 
(2.5).  However  as  noted  earlier,  g(x)  could  be  zero 
or  near  zero  at  a  number  of  points  x  for  any  given  set 
of  data  on  (X,Y)  with  a  number  of  symmetric  kernels  K. 

In  such  situations  it  is  hardly  advisable  to  use  rn  as 
an  estimate  of  r.  To  avoid  such  problems in  this  paper, 

>ie  propose  a  retraction  of  :rn  and  study  pointwise  as 
well  as  uniform  consistencies. 


For  a  position  b,  let  {a}^  stand  for  -b,  a  or  b 
according  as  a  <  -b,  (a|  s  b  or  a  >  b.  Let  cn  ■  cn(x) 
positive  function  of  n  and  x  which  for  each  x  con¬ 
verges  to  infinity  as  n  -*■  00 .  Our  proposed  estimator  of 
r(x)  is 


(2.6)  r(x) 


n 

6  1  I  Y  K 

-±il  L. 


fX^-xl 
6 


n'1  l  K 
j-l 


fx^-xl 


n 


However,  if  we  have  the  knowledge  of  some  function  c0(x) 
such  that  -c0(x)  s  r(x)  s  cQ(x),  our  proposed  estimator 
of  r(x)  would  be 


r*(x)  - 


1  n 

X.-x 

1 

6'1  l  Y.K 

J-l 

J 

<$ 

1  n 

[X.-x] 

n'1  l  K 

1 

i-V-J 

A  discussion  on  the  choice  of  c  ,  the  bandwidth 

n 

functions  6  and  n  and  the  kernel  function  K  is  made 
in  Section  5. 


3.  POINTWISE  CONSISTENCIES  WITH  AN  UPPER 
BOUND  FOR  MEAN  SQUARE  ERRORS 


be  a 


In  this  section  we  prove  the  pointwise  mean  square 
consistency  (and  hence  also  the  consistency  in  probability) 


of  our  estimators  r  and  r*,  and  obtain  the  speed  of 
convergence  in  each  case.  In  the  sequel  we  prove  also  the 
mean  square  consistency  of  h  and  g  as  estimators  of  h 
and  g  and  establish  the  speed  of  convergence.  An  explicit 

A 

bound  for  the  mean  square  errors  of  r  and  r*  are  also 
obtained. 

We  denote  gg(x)  ~  /f (x,y) dy  where 
f(s*0)(x,y)  =  3sf(x,y)/3xs,  hs(x)  =  Jyf(s’0) (x,y)dy  and 
p(x)  =  /y  f(x,y)dy.  Under  certain  regularity  conditions 
gs  and  hg  are  the  st^1  partial  derivatives  of  g  and  h  . 
We  however  make  no  such  regularity  assumptions.  Whenever 
there  is  no  ambiguity,  we  will  not  display  the  argument  x 
in  r(x) ,  r(x) ,  r*(x),  cn(x),  hCx),  g(x),  hs(x),  gg(x)  and 
p(x)  throughout  this  paper. 

Theorem  3.1.  Let  hg,  gs  and  p  be  continuous  at  x  and 
g(x)  >  0.  Then 

(3.0)  E(r(x)-r(x))2  =  0(c“*Yn) 

where 

Yn  =  max{62s,n2s,(n6)'1,(nn)'1}. 

To  prove  the  theorem  we  will  need  three  lemmas,  the  firs 
of  which  is  due  to  Singh  (1977b). 


0 


Lemma  3.1.  If  g  in  the  definition  of 


then  for  every  L  >  0, 


is  not  zero 


(3.1)  E(|^  -  r|AL)2  <  8(g)-1(E(h-h)2  +  (|r|  2+^jr)E(g-g)  2]  . 

g 

Proof.  The  inequality  is  a  special  case  of  the  lemma  in 
the  Appendix  of  Singh  (1977b)  and  hence  it  does  not  need  a 
separate  proof. 

In  the  next  two  lemmas  we  prove  the  mean  square  con- 

^  A 

sistencies  of  h  as  an  estimator  of  h  and  of  g  as  an 
estimator  of  g  respecitvely ,  and  in  each  case  we  obtain 
rates  of  convergence.  With  some  choices  of  6  and  n 
these  rates  are  of  the  order  0(n”2s/^1  +  2s^)  ,  and  hence 
can  be  made  arbitrarily  close  to  0(n-1)  by  taking  s 
sufficiently  large  (subject  to  (2-2)). 

Lemma  3.2.  Let  h„  and  p  be  continuous  at  x.  Then  the 


A 

asymptotic  behaviour  of  the  mean  square  error  of  h  a 


is  given  by 


(3.2)  MSE(h (x) )  =  E(h(x)-h(x)) 


~  [(|4  hs(x)  /  tsK(t) )  2  <■  (n6)'1p(x)  /  K2] 


Proof.  We  first  obtain  the  asymptotic  behaviors  of  Eh 
and  var(h).  Then  we  combine  these  to  obtain  (3.2). 


Since  •  •  • » (xn,Yn'  are  x.i.d.  with  joint 

density  f,  from  (2.3),  we  can  write 


(3.3) 


Eh(x)  *  //yK(t)f(x+5t,y)dtdy. 


Now  expanding  f(x+6t,y)  at  (x,y)  in  <5t  by  Taylor 
series  expansion  with  the  integral  form  of  the  remainder. 


we  write 


f (x+6 1 ,y)  -  l  ^  >0J  (x,y) 

j-0  J  • 


|X  +  6t  (X^t-U)S*1f(S*0)(u>y)dU. 


In  view  of  this  expansion  and  the  orthogonality  properties 
(2.2)  of  K  we  get  from  (3.3), 


(3.4)  Eh(x)  =  /yf(x,y)dy 


J/yK(t)|-^jyy  j  (x+5t-u)b“1f(-s’0)  (u,y)du|dt 


Thus , 


x  '  5  (X  +  <5t  , 

(3.5)  6  SE  (h(x)  -h(x) )  *  ,  //y  K(t)  j  (x+St-u)3'1 

X 

,£(s ,0) (u^y^dudtdy. 

But  since  x  is  a  point  of  continuity  of  hg(x)  *  Jyf^s’0^  (x,y)dy , 
K  is  bounded  with  |yK(y)  |  -*0  as  |y|  «,  by  arguments 

used  in  Singh  (1977a)  or  in  Menon,  Prasad  and  Singh  (1981), 
the  rhs  of  (3.5)  is,  as  n  «,  asymptotically  equivalent  to 


'Cs-ni  JVflS,U;(x,y)/K(t)  J  '  (x+6t-u)s’Adudtdy 
3  if  / tSK(t)dt, 

and  we  conclude  that,  as  n  -*■  «», 

s  Mx)  ,  s 

(3.6)  (Eh(x)-h(x))  ~  «S  —  /tsK( t) ) . 

Now  we  will  evaluate  the  variance  of  h.  By  a  change 
of  variable  we  see  that 

X  -  x 

(3.7)  6"1E[Y1K(-^— )]2  -  //K2(t)y2f  (x+6t  ,y)  dtdy . 

Since  p  is  continuous  at  x,  by  arguments  similar  to 
those  given  in  Lemma  1  of  Parzen  (1962),  the  r.h.s.  of 

(3.7)  is  asymptotically  equivlaent  to  p(x)/K2.  Further, 
since 

6-1[EY1K(-^r-)]2  -  6  l//yK(t)f (x+6t,y)dtdy)2  =  6(Eh(x))2 

- 1  Vx  2 

by  (3.3),  we  have  from  (3.5),  6  X[EY^K( — ^ — )]  a  o(l). 

Thus  since  (X1,Y1) , . . . , (Xn,Yn)  are  i.i.d.,  we  conclude 
that 

(3.8)  var(h(x) )  ~  (n6 ) " 1p (x) /K2 . 

Now  (3.6)  and  (3.8)  give  (3.2).  This  completes  the 


proof  of  Lemma  (3.2). 


3 


i 

Lemma  3.3.  If  gg  is  continuous  at  x,  then 

g 

(3.9)  MSE(g(x))  ~  [ gs(x)/tsK(t))2+(nn)~1g(x)/K^] , 

j  a  •  3 

and  if,  instead,  g^ ,  the  sttl  order  derivative  of  g 
is  continuous  at  x,  then  (3.9)  holds  with  gg  replaced 


Proof.  Proof  of  (3.9)  follows  by  arguments  given  for  (3.2). 


Remark  3.1.  Taking  5  and  n  proportional  to  n"l/(2s+l)^ 

we  see  from  (3.2)  and  (3.9)  that  MSE(h)  and  MSE(g)  are 

both  of  the  order  0(n”2s/^1  +  2s^)  .  The  value  of  6  that 

« 

minimizes  the  rhs  of  (3.8)  and  that  of  n  that  minimizes 
the  rhs  of  (3.9),  are,  nevertheless,  given  by 

-ll/(1+2s) 

T 


(3.10) 


6* 


n'1P(x) (K2 


L?s (hs (x) /tsK(t) / s ! ) 


and 

(3.11) 


n'^Cxj/K' 


— ,  1/Cl*2s) 
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l_2s(g(x)/tsK(t)/s!)_J 


respectively.  Using  these  optimal  values  of  6  and  n  one 
can  easily  obtain  the  asymptotic  values  of  the  mean  square 

A  A 

errors  of  h  and  g  which  are  minimum  over  the  class  of 

all  windowwidth  functions  6  and  n.  However,  since  the 

2 

exact  value  of  the  ratio  p(x)/hg(x)  for  5*  and  of  the 
2 

ratio  g(x)/g  (x)  for  n*  are  not  known,  only  approximate 


values  of  6*  and  n*  (by  getting  approximate  values  of 
these  ratios),  can  be  used  in  practice.  The  expression  for 
n*  is  noted  in  Rosenblatt  (1956)  (for  s  ■  2)  and  in 
Singh  (1979)  for  general  s,  among  many  others. 

Proof  of  Theorem  5.1.  Writing  |r-r|  *  |(r-(r)c  )+((*)c  -r 

n  n 

we  have  with  probability  one, 

A 

|r-r|  s  (|t  '  r|  A  cn)  ♦  |r|I(  |r|  >  cn)  . 


Hence  by  Lemma  3.1, 


(3.12)  E(r-r)  2  s  16(g)"1(E(h-h)2 

.  3 _ r  ,-,2  „2lt 


♦  7  max( jrjz,cnJE(g-g)  ] 
+  2  |  r  1  2 1  ( |  r  |  >  cn). 


as  n  -► 


Now  since  cn  ®  as  n  -►  «,  there  exists  an  nQ  *  nQ(x) 
such  that  for  all  n  2  n q,  cn(x)  2  |r(x)|  and  the  second 
term  on  the  rhs  of  (3.12)  is  equal  to  zero  for  all  n  *  n 
The  rest  of  the  proof  is  now  an  immediate  consequence  of 
(3.2)  and  (3.9). 


Remark  3.2.  Notice  that  (3.12)  gives  an  explicit  bound  for 
each  sample  size  for  the  mean  square  error  of  the  estimator 

A  A 

of  the  regression  curve  in  terms  of  MSE(h)  and  MSE(g). 


j  '  -  •  •. 
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Exact  asymptotic  expressions  for  these  terms  are  in  turn 
presented  in  (3.2)  and  (3.9)  respectively.  Hence  the 
exact  asymptotic  value  of  the  bound  (3.12)  for  MSE(r) 
is  at  hand.  To  the  best  of  our  knowledge  an  explicit 
bound  with  an  exact  asymptotic  value  for  the  MSE  of  a 
nonparametric  regression  curve  estimate,  of  whatsoever 
nature  it  may  be,  is  lacking  in  the  existing  literature, 
inspite  of  a  large  number  of  articles  on  the  subject. 

Remark  3.3.  From  Theorem  3.1  it  follows  that  if  6  and 
n  are  chosen  in  a  way  so  that 

(3.13)  6  ~  n  -  0(n'X(1+2s;)) , 

then  defined  in  Theorem  3.1  is  of  the  order, 

n 

(3.14)  Yn  -  0(n'2s/(1*2s)) 
and 

(3.15)  MSE(r(x) )  *  0(n" 2s/ C1+2s) c2) . 

Remark  3.4.  As  pointed  out  earlier,  if  there  is  a  known 

Cq(x)  such  that  jr(x) |  s  Cq(x),  we  would  instead  conside 
estimating  r  by  r*  defined  in  Section  2.  It  follows 
from  the  proof  of  Theorem  3.1  that 

(3.16)  E(r*(x) -r(x) ) 2  -  0(Yn). 

Thus  r*  achieves  a  MSE  rate  of  convergence  better  than 


►i  > 


The  following  (3.17)  and  (3.18)  are  immediate  con¬ 
sequences  of  (3.15)  and  (3.16). 


Corollary  3.1.  (Weak  consistency).  Under  the  conditions 


of  Theorem  3.1  and  (3.13),  for  every  sequence  a  -►  « 
(3.17)  | r(x) -r(x) |  -  o(n's^1+2scn(x) *an)  in  prob. 


(3.18)  j  r*(x) -r(x)  j  «  o(n‘  1/1+2sctn) 


in  prob. 


Remark  3.5.  It  is  clear  from  the  results  in  (3.0),  (3.14) 
(3.17)  and  (3.18)  that  larger  the  s  the  better  the  rate 
of  convergence.  However,  choosing  a  larger  value  of  s 
means  putting  more  restrictions  on  h  and  g.  Further, 
any  choice  of  s  more  than  4  or  5  makes  the  computation 

/v  ^ 

of  h  and  g  difficult.  It  is  seen  quite  often  in  the 
case  of  density  estimates  that  the  improvement  in  the  rate 
of  convergence  with  an  s  being  5  or  more  is  not  signifi¬ 
cant  compared  to  the  extra  difficulty  one  incurs  in  the 
computation  of  the  estimates.  The  same  is  expected  in  the 
case  of  regression  estimates. 


4.  Uniform  consistencies 


In  Section  3  we  proved  the  mean  square  consistency  and 
deduced  the  consistency  in  probability  of  the  estimators 

A 

r  and  r*  at  a  point  x,  and  in  each  case  we  investigated 
the  speed  of  convergence.  In  this  section  we  plan  to  prove 
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the  uniform  mean  square  consistency  as  well  as  the  uniform 
consistency  of  r  and  r*.  The  following  Theorem  follows 
directly  from  the  proof  of  Theorem  3.2. 


Theorem  4.1.  Let  B  be  any  subset  of  the  real  line  such 
that  infXcfig(x)  >  0  and  supxeB|r(x)|  <  «  (the  bounds 
in  respective  cases  need  not  be  known) ,  and  p,  hg  and 
gs  are  uniformly  continuous  on  B .  Then 

(4.1)  supxeBE(r(x)-r(x))2  -  0(Yn*c*2) 

where  c*  *  sup„  „c  (x),  and  y_  is  as  defined  in 
Theorem  3.1.  Also 

(4.2)  supX£0E(r*(x)-r(x)) 2  -  0(yn). 


Thus  if  6  and  n  are  proportional  to  n 


- 1/ (l  +  2s) 


then 


(4.1)'  supxeBMSE(?(x))  -  0(n’2/s/(1+2s)-c*2) 


(4.2)'  supxeBMSE(r*(x))  -  0 (n* 2s/ ( 1+2s) ) . 

The  result  (4.1)  or  (4.2)  does  not  however  prove  the 
uniform  weak  consistency  of  r  or  r*.  If  the  characteris- 
tic  function  of  K  is  absolutely  integrable  and  E | Y |  <  *, 

then  it  can  be  shown  (e.g.  see  Singh  and  Ullah  (1985)),  that 


E{sup  |h(x)-Eh(x) |>  -  0((n6)'1/2). 


(4.3) 


Hence  it  follows  from  Lemma  3.2  that  if  hg  and  p  are 
uniformly  continuous  on  B,  then 

(4.4)  E{supxefi|h(x) -h(x)  J  )  ■  0(max{6s  ,  (n6)  ’ 1//2  ) 
which  in  turn  implies  that 

supxeB |h(x) -h(x) I  *  0(max(6s, (nfi)'1^2))  in  prob. 

Similarly,  if  the  characteristic  function  of  K  is  abso¬ 
lutely  integrable  and  gs  is  uniformly  continuous,  then 

(4.5)  E(supX€B|g(x)-g(x) j }  *  0(max(ns, (nn)-1^2} 
and 

supxeB | g(x) -g(x) I  “  0(max{n®(nn)‘1/2})  in  prob. 

To  deduce  the  uniform  weak  consistency  of  r  and  r* 
from  the  above  analysis,  notice  that  as  in  the  proof  of 
Theorem  3.1,  |r-r|  is  bounded  a.s.  by  |(h/g)-h/g)|  a  cn 

+  | r | I ( | r l  >  cn) ,  and  the  proof  of  the  lemma  in  the 
Appendix  of  Singh  (1977b)  gives 

E  "*«»tl!S  '  I&1  A  c"Cx))  s 

•{E  supX6g jh(x)-h(x) )+(supxeB|r(x) |+c*)E  supxeB | g(x) -g(x) | } 
Further,  there  exists  an  nn  such  that  for  all  n  s  nn, 


suPxeB | r(x) | I C | r(x) |  >  cnCx))  =  0  (this  follows  because 
suPX£Blr(x)l  <  though  t^ie  uPPer  bound  need  not  be  known, 
and  cn(x)  "*■  “  for  each  x  in  B) .  From  these  analyses 
and  (4.4)  and  (4.5)  we  conclude  the  following  theorem. 

Theorem  4.2.  Let  E | Y J  <  and  for  a  subset  B  of  the 
real  line,  the  hypothesis  of  Theorem  4.1  hold.  Then 

(4.6)  E(supX€B|r(x)-r(x)  |>  «  °(Yy2*cJ) 
and 

(4.7)  E(supxeBlr*(x)-r(x)l}  -  0(Y*/2). 

Thus  from  (4.6),  supxeg |r(x) -r(x)  |  -  (Hy^cJ)  in 
probability,  and  from  (4.7),  supXefi  | r*(x) -r(x)  |  * 
in  probability.  Taking  6  and  n  ’-proportional  to  n“1/(1+2s), 

Yn  is  of  the  order  n“2/U+2s). 


5.  SOME  CONCLUDING  REMARKS 

The  choice  of  c  in  the  definition  of  our  estimator 

n 

r  is  completely  arbitrary,  and  it  is  not  possible  to  give 
an  explicit  formula  to  determine  a  value  of  c^  which  may 
fit  well  in  all  practical  situations.  If,  however,  in  a 
particular  situation,  we  have  some  knowledge,  say  Aq,  of 
the  range  of  the  possible  values  of  the  response  variable 


Y,  we  may  choose  c  fx)  =  Ana  where  a  is  a  slowly 
1  nun  n 

converging  to  infinity  sequence  of  n,  something  like 
log  n  or  loglog  n  (depending  on  how  good  is  our  know¬ 
ledge  about  the  range  of  Y)  .  In  any  case,  cn  must  be 


chosen  so  that  n  s/(l+2s)c 


n 


as  n  . 


Examining  the  asymptotic  expressions  of  MSE(h)  and 
MSE(g)  obtained  in  Section  3,  we  remark  that  one  should 
choose  K  so  that  |/tsK(t)dt|  and  /K2(t)dt  be  as 
small  as  possible.  This  is  also  the  case  even  if  one 
uses  the  optimal  6  and  n  given  in  (3.10)  and  (3.11) 
respectively ,  since  with  these  choices  of  6  and  n. 


w  (x)  »  (l+2s) 


and 


where 


w,(x)  -  (l+2s) 


whe  re 

|hs(x)  tsK(t)| 

fp(x)  K2} 

s! 

2s  J 

~  n-2s/(  1*2^00 

|gs(x)/tsK(t)|( 

fg(x)  K2} 

s !  1 

Zs  J 

2/1+Zs 


2/(l+2s) 


Now  examining  the  optimal  values  of  6  and  n  given 
in  (3.10)  and  (3.11),  we  remark  that  6  and  n  should  be 
proportional  to  n*l/(l+2s).  (This  has  been  pointed  out 


in  a  number  of  articles  on  density  estimates  dealing  with 
rates  of  convergence,  e.g.  Singh  (1977,  1979).)  Examining 


the  estimates  g,  we  see  that  var(g)  will  be  large 

whenever  the  var(K( (X. -x)/n) )  is  large,  which  in  turn 

2 

will  be  inflated  when  var(X.)  ■  aY  (say)  is  large.  To 
control  this  (and  hence  to  control  var(g))  to  some 
extent,  we  remark  that  n  should  also  be  proportional  to 
a^,  that  is,  if  possible,  n  should  be  taken  to  be  °on'» 
where  oQ  is  a  good  guess  of  and  n'  is  proportional 

to  n  ^1  +  2s .  we  have  the  same  view  with  6  as  well. 
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